An Incremental Architecture for the Semantic Annotation of Dialogue Corpora with High-Level Structures. A case of study for the MEDIA corpus
نویسندگان
چکیده
The semantic annotation of dialogue corpora permits building efficient language understanding applications for supporting enjoyable and effective human-machine interactions. Nevertheless, the annotation process could be costly, time-consuming and complicated, particularly the more expressive is the semantic formalism. In this work, we propose a bootstrapping architecture for the semantic annotation of dialogue corpora with rich structures, based on Dependency Syntax and Frame Semantics.
منابع مشابه
Semantic Frame Annotation on the French MEDIA corpus
This paper introduces a knowledge representation formalism used for annotation of the French MEDIA dialogue corpus in terms of high level semantic structures. The semantic annotation, worked out according to the Berkeley FrameNet paradigm, is incremental and partially automated. We describe an automatic interpretation process for composing semantic structures from basic semantic constituents us...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملPortability of Semantic Annotations for Fast Development of Dialogue Corpora
Generalization of spoken dialogue systems increases the need for fast development of spoken language understanding modules for semantic tagging of speaker’s turns. Statistical methods are performing well for this task but require large corpora to be trained. Collecting such corpora is expensive in time and human expertise. In this paper we propose a semi-automatic annotation process for fast pr...
متن کاملLeveraging study of robustness and portability of spoken language understanding systems across languages and domains: the PORTMEDIA corpora
The PORTMEDIA project is intended to develop new corpora for the evaluation of spoken language understanding systems. The newly collected data are in the field of human-machine dialogue systems for tourist information in French in line with the MEDIA corpus. Transcriptions and semantic annotations, obtained by low-cost procedures, are provided to allow a thorough evaluation of the systems’ capa...
متن کاملUsing MMIL for the High Level Semantic Annotation of the French MEDIA Dialogue Corpus
The MultiModal Interface Language formalism (MMIL) has been selected as the High Level Semantic (HLS) formalism for annotating the French MEDIA dialogue corpus. This corpus is composed of human-machine dialogues in the domain of hotel reservation and tourist information. Utterances in dialogues have been previously annotated with a concept-value flat semantics for studying and evaluating spoken...
متن کامل